Now that the 2020 election is officially over and Biden was elected as the President of the United States, it is important that I reflect on my prediction model. I am excited to see how I cold learn from my model for future models that I create.
Let’s first recap on my prediction model to get a better picture of what it was.
My prediction model was an ensemble model that predicted the popular vote share for each state .
Given that the Time For Change Model was an inspiration, I decided to focus my model on historical republican vote share as Trump was the incumbent for the 2020 election and incumbency was one predictor in the Time For Change Model.
I decided to separate America into three categories - red states, blue states, and battleground states - for my model to adjust for overfitting. The grouping were based on how FiveThirtyEight grouped states.
My model used the following data:
In my model, I decided to classify approval, Q2 GDP growth, and turnout as fundamentals
Thus, my ensemble model weighted the poll model (using only polls) by 0.96 and the fundamental model (using only fundamentals) by 0.04 as I weighted the model based on FiveThirtyEight’s reasoning that polls are better predictors as the election nears since fundamentals become more noisy instead.
My final prediction using the ensemble model was that Biden was projected to win 310 electoral votes while Trump is projected to win 228 votes, meaning Biden would become president-elect of the United States.
Overall, I am pretty satisfied with how my model turned out. While I did miss a few states and this is my first election forecast, I was quite happy that I predicted some battleground states correctly.
Above is a comparison between my predictions and the actual results of the 2020 election. As you can see, the states that I got wrong were battleground states. However, I would like to say that the predictive intervals for the battleground states did capture the true result.
Moreover, let’s take a look into the plot above, which plots the actual two-party vote share for Trump against my predictions for Trump. The blue points represent states Biden won and the red points represent states Trump won.
Furthermore, the map above shows the difference between Trump’s actual and predicted two party vote share in each state. A negative difference means that Trump was overpredicted for that particular state while a positive difference means that Trump was underpredicted for that particular state.
Now that we have went over my prediction model, it is important to look at possible hypotheses for the inaccuracies seen in my model. My model seemed to incorrectly predict the results for battleground states in particular and it is important we pay attention to the reasons why. Below are my hypotheses for explaining the inaccuracies of my model:
One hypothesis to explain the inaccuracy of my model was that it failed to take into account the recent voter trends in particular states. For example, Georgia and Texas have been trending blue recently but my model failed to take note of this. This could be because my model relied more heavily on historical polling averages and so since Georgia and Texas were traditionally red states, my model would predict the same for 2020.
While my model took into consideration the expected increase in turnout rate for the 2020 election, my model failed to take into consideration the turnout rates for different groups. For example, Stacy Abrams played a crucial role in black voter-turnout in Georgia in favor for Biden. The same goes for the large Latinx turnout in Arizona and Nevada, which also helped Biden. However, there were many Latinos that voted for Trump particularly in South Texas and Florida. Given the large turnout rates for some of these groups, they can play a significant role in determining the election.
Another hypothesis is that my model relied heavily on inaccurate polls. Some polls in 2020 were fairly inaccurate because they were non-representative of voters and there was non-response bias, particularly from conservatives. On average, polls were off by 2.5 points in battleground states. Given the inaccuracy of polls, this may explain why my model had inaccuracies, especially since I weighted the poll model by 0.96 in my ensemble model.
Another hypothesis is that the state Q2 GDP growth rate as a fundamental variable may have hurt Trump more than it was supposed to, especially in battleground states and traditionally red states. This is because economic predictors were very noisy this year due to a recession caused by Trump’s handling of the Covid pandemic. Since the 2020 economy was an anomaly, it probably would be best to not use economic predictors in my model.